Stochastic Lexicalized Tree-adjoining Grammars
نویسنده
چکیده
The notion of stochastic lexicalized tree-adjoining g rammar (SLTAG) is formally defined. The parameters of a SLTAG correspond to the probability of combining two structures each one associated with a word. The characteristics of SLTAG are unique and novel since it is lexieally sensitive (as N-gram models or Hidden Markov Models) and yet hierarchical (as stochastic context-free grammars) . Then, two basic algorithms for SLTAG arc introduced: an algori thm for comput ing the probability of a sentence generated by a SLTAG and an inside-outsidelike iterative algori thm for est imating the parameters of a SLTAG given a t raining corpus. Finally, we should how SLTAG enables to define a lexicalized version of stochastic context-free grammars and we report preliminary experiments showing some of the advantages of SLTAG over stochastic context-free grammars . 1 M o t i v a t i o n s Although stochastic techniques applied to syntax modeling have recently regained popularity, current lazlguage models suffer from obvious inherent inadequacies. Early proposals such as Markov Models, N-gram models (Pra t t , 1942; Shannon, 1948; Shannon, 1951) and tlidden Markov Models were very quickly shown to be linguistically not appropriate for natural language (e.g. Chomsky (1964, pages 13-18)) since they are unable to capture long distance dependencies or to describe hierarchically the syntax of natural languages. Stochastic context-free granunar (Booth, 1969) is a hierarchical model more appropriate for natural languages, however none of such proposals (Lari and Young, 1990; Jelinek, Lafferty, and Mercer, 1990) perform as well as the simpler Markov Models because of the difficulty of capturing lexical information. The parameters of a stochastic context-free grammar do not correspond directly to a distr ibution over words since distributional phenomena over words tha t are embodied by the application of *This work was partially supported by DARPA Grant N001490-31863, ARO Grant DAAL03-89-C-0031 and NSF Grant 1RI9016592. We thank Aravind Joshi for suggesting the use of TAGs for statistical analysis during a private discussion that followed a presentation by bS'ed Jdinek during the June 1990 meeting of the DARPA Speech and Natural Language Workshop. We are also grateful to Peter Braun, FYed Jelinek, Mark Liberman, Mitch Marcus, Robert Mercer, Fernando Pereira said Stuart Shieber for providing vMu~ble comments. more than one context-free rule cannot be captured under the context-freeness assumption. This leads to the difficulty of maintaining a s t andard hierarchical model while captur ing lexieal dependencies. This fact prompted researchers in na tura l language processing to give up hierarchical language models in the favor of non-hierarchical statistical models over words (such as word N-grams models). Probably for lack of a bet ter language model, it has also been argued tha t the phenomena tha t such devices cannot capture occur relatively infrequently. Such argumentat ion is linguistically not sound. Lexicalized tree-adjoining g rammars (LTAG) t combine hierarchical s t ructures while being hx ieany sensitive and are therefore more appropriate for statistical analysis of language. In fact, LTAGs are the simplest hierarchical formalism which can serve as the basis for lexicalizing context-free g rammar (Schabes, 1990; Joshi and Sehabes, 1991). LTAG is a tree-rewriting system tha t combines trees of large domain with adjoining and substitution. The trees found in a TAG take advantage of the available extended domain of locality by localizing syntactic dependencies (such as finer-gap, subject-verb, verb-objeet) and most semantic dependencies (such as predicateargument relationship). For example, the following trees can be found in a LTAG lexicon:
منابع مشابه
Extraction of Tree Adjoining Grammars from a Treebank for Korean
We present the implementation of a system which extracts not only lexicalized grammars but also feature-based lexicalized grammars from Korean Sejong Treebank. We report on some practical experiments where we extract TAG grammars and tree schemata. Above all, full-scale syntactic tags and well-formed morphological analysis in Sejong Treebank allow us to extract syntactic features. In addition, ...
متن کاملStochastic Tree-Adjoining Grammars
A B S T R A C T The notion of stochastic lexicalized tree-adjoining grammar (SLTAG) is defined and basic algorithms for SLTAG are designed. The parameters of a SLTAG correspond to the probability of combining two structures each one associated with a word. The characteristics of SLTAG are unique and novel since it is lexically sensitive (as N-gram models or Hidden Markov Models) and yet hierarc...
متن کاملVerification of Lexicalized Tree Adjoining Grammars
One approach to verification and validation of language processing systems includes the verification of system resources. In general, the grammar is a key resource in such systems. In this paper we discuss verification of lexicalized tree adjoining grammars (LTAGs) (Joshi and Schabes, 1997) as one instance of a system resource, and as one phase of a larger verification effort.
متن کاملExploring the Underspeciied World of Lexicalized Tree Adjoining Grammars
This paper presents a precise characterization of the underspeciication found in Lexicalized Tree Adjoining Grammars, and shows that, in a sense, the same degree of underspeciication is found in Lexicalized D-Tree Substitution Grammars. Rather than describing directly the nature of the elementary objects of the grammar, we achieve our objective by formalizing the way in which underspeciication ...
متن کاملTree-Adjoining Grammars Are Not Closed Under Strong Lexicalization
A lexicalized tree-adjoining grammar is a tree-adjoining grammar where each elementary tree contains some overt lexical item. Such grammars are being used to give lexical accounts of syntactic phenomena, where an elementary tree defines the domain of locality of the syntactic and semantic dependencies of its lexical items. It has been claimed in the literature that for every tree-adjoining gram...
متن کاملMultiple Context-Free Tree Grammars and Multi-component Tree Adjoining Grammars
Strong lexicalization is the process of turning a grammar generating trees into an equivalent one, in which all rules contain a terminal leaf. It is known that tree adjoining grammars cannot be strongly lexicalized, whereas the more powerful simple context-free tree grammars can. It is demonstrated that multiple simple context-free tree grammars are as expressive as multi-component tree adjoini...
متن کامل